"""# 谜题训练场开发任务

## 任务概述
你是一位资深程序员，我需要你帮我实现一个特定谜题的训练场环境类。这个类继承自`Basebootcamp`，用于生成谜题实例并验证解答。

## 背景说明
我正在开发一系列谜题训练场，每个训练场对应一个特定类型的谜题。训练场类命名为`{PuzzleName}bootcamp`，其中`PuzzleName`是谜题的名称。

每个训练场类主要提供两个核心功能：
1. 生成该谜题类型的问题实例
2. 验证用户对问题的回答是否正确

## 技术接口规范

### 类方法实现要求

```python
class {PuzzleName}bootcamp(Basebootcamp):
    def __init__(self, **params):
        \"\"\"
        请你自定义params，以保存该puzzle相关的参数，例如网格大小等，参数配有默认值
        \"\"\"
        pass

    def case_generator(self):
        \"\"\"
        生成谜题实例，提示：为保证谜题有解，可以先生成结果再对结果处理得到谜题
        返回：一个可JSON序列化的字典（避免包含set等无法通过json.dumps处理的数据结构）
        \"\"\"
        pass

    @staticmethod
    def prompt_func(question_case) -> str:
        \"\"\"
        将case_generator生成的谜题实例转换为文本形式的问题，问题中包含问题背景、对谜题规则的介绍、具体要解决的谜题实例、期望最终答案的格式，
        例如：你是xxxx，请你解答yyyy，规则如下：yyyy，最终答案放置在：zzzzz
        注意：规则应当描述详细，包括任务背景、具体任务操作规则、对题目格式和答案格式的含义介绍等

        参数:
            question_case: 由case_generator生成的谜题实例

        返回:
            str: 格式化的问题字符串

        注意:
            1. 需考虑问题的格式，以便后续能正确提取
            2. 问题描述中应包含期望的答案格式说明，以便后续能正确提取，为了避免抽取时匹配出干扰项，请要求模型将答案放在特定标签，如[answer] [/answer]内
        \"\"\"
        pass

    @staticmethod
    def extract_output(output):
        \"\"\"
        从LLM的回复中提取符合格式要求的答案，如有多个，请抽取最后一个，避免使用re.search等只抽取第一个结果的方式。

        参数:
            output: LLM的完整输出（包含原始问题和回答）

        返回:
            提取的答案，若未找到符合格式的答案则返回None
        \"\"\"
        pass

    @classmethod
    def _verify_correction(cls, solution, identity):
        \"\"\"
        验证提取的答案是否正确，注意一个问题可以能有多个解，按照谜题规则进行检验，不要直接匹配可能的答案。

        参数:
            solution: extract_output提取的答案
            identity: case_generator生成的谜题实例

        返回:
            bool: 答案是否正确
        \"\"\"
        pass
```

### 验证评分方法（基类已实现）

```python
@classmethod
def verify_score(cls, model_output, identity:dict, format_score=0.1) -> float:
    \"\"\"
    验证输出结果并评分。

    参数:
        model_output: 模型的完整输出
        identity: 谜题实例（由case_generator生成）
        format_score: 答案格式正确时的基础分数

    返回:
        float: 评分结果（0-1之间）
    \"\"\"
    score = 0.
    try:
        extract_solution = cls.extract_output(model_output)
        if extract_solution is None:
            return score
        else:
            score = format_score # 格式正确时的基础分数
        if cls._verify_correction(extract_solution, identity):
            score = 1.  # 答案完全正确时的满分
    except Exception as e:
        # 处理异常情况
        pass
    return score
```

### 使用示例

```python
# 初始化谜题训练场
bootcamp = Puzzlebootcamp()

# 生成谜题实例
case = bootcamp.case_generator()

# 将谜题转换为文本问题
prompt = Puzzlebootcamp.prompt_func(case)

# 获取LLM对问题的解答
response = get_response(prompt, \"LLM\")

# 从完整对话中提取答案
extracted_output = Puzzlebootcamp.extract_output(prompt + response)

# 验证答案并评分
score = Puzzlebootcamp.verify_score(extracted_output, case)
```

## 你的任务
请根据以下谜题描述（谜题描述可能不完整，请先结合你的知识澄清规则），实现一个完整的谜题训练场类：

### 谜题描述
[Description]
# BBEH Word Sorting

This task is composed of two subtasks.

Subtask 1: This subtask is from the
[BIG-Bench Mistake](https://arxiv.org/pdf/2311.08516). This task involves
finding the first mistake in an existing chain-of-thought sequence, used to
answer a Word Sorting question in the original BBH dataset. In each example,
the target answer is either the number where the first mistake occurred, or
that there are no mistakes in the CoT sequence. These CoT sequences are
generated by prompting PaLM 2 Unicorn on the original BBH dataset at
temperature = 0. The newline is used as a stop token so that each
intermediate step can be prepended with *Thought 1:* , *Thought 2: *, etc.
Further information on the prompting and generation process can be found in
the original work.

Subtask 2: The second sub-task is sorting a list of words given a new
alphabet order (examples include: an alphabet order that is the same as
English but two letters are swapped in the order, an alphabet order that is
the same as English but one/two letters are moved to the beginning/end of the
order, or a completely new order). This task requires going against a strong
prior and sorting words in a non-typical way.

[/Description]
<example_1>

You are an expert in word sorting. You will be provided with a list of words and the steps taken, in the form of thoughts, to arrange them in alphabetical order.
Your job is to identify the first step that was a mistake in reasoning about the order of the words. This can be misunderstanding the order of
the alphabet or getting the word or letter wrong or missing a word entirely.
Q: Sort the following words alphabetically: List: apocalypse tote apostle gay kauffman dicta hog prowl aunt calm carboxy dishevel orifice tearful karate ellen
Thought 1: I should start by looking at the first letter of the words in the list. The first letter: \"apocalypse\": \"a\" (1). \"tote\": \"t\" (20). \"apostle\": \"a\" (1). \"gay\": \"g\" (7). \"kauffman\": \"k\" (11). \"dicta\": \"d\" (4). \"hog\": \"h\" (8). \"prowl\": \"p\" (16). \"aunt\": \"a\" (1). \"calm\": \"c\" (3). \"carboxy\": \"c\" (3). \"dishevel\": \"d\" (4). \"orifice\": \"o\" (15). \"tearful\": \"t\" (20). \"karate\": \"k\" (11). \"ellen\": \"e\" (5).
Thought 2: We now have: (1) [\"apocalypse\" ? \"apostle\" ? \"aunt\" ? \"calm\" ? \"carboxy\"] < (3) \"dicta\" < (4) [\"dishevel\" ? \"dicta\"] < (5) \"ellen\" < (7) \"gay\" < (8) \"hog\" < (11) [\"kauffman\" ? \"karate\"] < (15) \"orifice\" < (16) \"prowl\" < (20) [\"tearful\" ? \"tote\"].
Thought 3: Now let's sort this subpart [\"apocalypse\" ? \"apostle\" ? \"aunt\" ? \"calm\" ? \"carboxy\"] by looking at their second letters. The second letter: \"apocalypse\": \"p\" (16). \"apostle\": \"p\" (16). \"aunt\": \"u\" (21). \"calm\": \"a\" (1). \"carboxy\": \"a\" (1).
Thought 4: We now have: (1) \"calm\" < (16) [\"apocalypse\" ? \"apostle\"] < (21) \"aunt\" < (26) \"carboxy\" for the subpart. Hence, we have \"calm\" < [\"apocalypse\" ? \"apostle\"] < \"aunt\" < \"carboxy\" < \"dicta\" < [\"dishevel\" ? \"dicta\"] < \"ellen\" < \"gay\" < \"hog\" < [\"kauffman\" ? \"karate\"] < \"orifice\" < \"prowl\" < [\"tearful\" ? \"tote\"].
Thought 5: Now let's sort this subpart [\"apocalypse\" ? \"apostle\"] by looking at their third letters. The third letter: \"apocalypse\": \"o\" (15). \"apostle\": \"s\" (19).
Thought 6: We now have: (15) \"apocalypse\" < (19) \"apostle\" for the subpart. Hence, we have \"calm\" < \"apocalypse\" < \"apostle\" < \"aunt\" < \"carboxy\" < \"dicta\" < [\"dishevel\" ? \"dicta\"] < \"ellen\" < \"gay\" < \"hog\" < [\"kauffman\" ? \"karate\"] < \"orifice\" < \"prowl\" < [\"tearful\" ? \"tote\"].
Thought 7: Now let's sort this subpart [\"dishevel\" ? \"dicta\"] by looking at their second letters. The second letter: \"dishevel\": \"i\" (9). \"dicta\": \"i\" (9).
Thought 8: We now have: (9) [\"dishevel\" ? \"dicta\"] for the subpart. Hence, we have \"calm\" < \"apocalypse\" < \"apostle\" < \"aunt\" < \"carboxy\" < \"dicta\" < [\"dishevel\" ? \"dicta\"] < \"ellen\" < \"gay\" < \"hog\" < [\"kauffman\" ? \"karate\"] < \"orifice\" < \"prowl\" < [\"tearful\" ? \"tote\"].
Thought 9: Now let's sort this subpart [\"kauffman\" ? \"karate\"] by looking at their second letters. The second letter: \"kauffman\": \"a\" (1). \"karate\": \"a\" (1).
Thought 10: We now have: (1) [\"kauffman\" ? \"karate\"] for the subpart. Hence, we have \"calm\" < \"apocalypse\" < \"apostle\" < \"aunt\" < \"carboxy\" < \"dicta\" < [\"dishevel\" ? \"dicta\"] < \"ellen\" < \"gay\" < \"hog\" < [\"kauffman\" ? \"karate\"] < \"orifice\" < \"prowl\" < [\"tearful\" ? \"tote\"].
Thought 11: Now let's sort this subpart [\"tearful\" ? \"tote\"] by looking at their second letters. The second letter: \"tearful\": \"e\" (5). \"tote\": \"o\" (15).
Thought 12: We now have: (5) \"tearful\" < (15) \"tote\" for the subpart. Hence, we have \"calm\" < \"apocalypse\" < \"apostle\" < \"aunt\" < \"carboxy\" < \"dicta\" < [\"dishevel\" ? \"dicta\"] < \"ellen\" < \"gay\" < \"hog\" < [\"kauffman\" ? \"karate\"] < \"orifice\" < \"prowl\" < \"tearful\" < \"tote\".
Thought 13: I have now sorted all the words. The answer is apocalypse apostle aunt carboxy calm dicta dishevel ellen gay hog kauffman karate orifice prowl tearful tote
Q: Is there a mistake in this sequence? Write \"No\" if there are no mistakes, or the number N if there is a mistake in Thought N.

</example_1>
<example_2>

You are an expert in word sorting. You will be provided with a list of words and the steps taken, in the form of thoughts, to arrange them in alphabetical order.
Your job is to identify the first step that was a mistake in reasoning about the order of the words. This can be misunderstanding the order of
the alphabet or getting the word or letter wrong or missing a word entirely.
Q: Sort the following words alphabetically: List: shutdown beebread conveyor courtyard semantic aqueduct lacewing grown mummy infinitude tennessee taillight monarch neural aristocrat topheavy flagler deviate
Thought 1: I should start by looking at the first letter of the words in the list. The first letter: \"shutdown\": \"s\" (19). \"beebread\": \"b\" (2). \"conveyor\": \"c\" (3). \"courtyard\": \"c\" (3). \"semantic\": \"s\" (19). \"aqueduct\": \"a\" (1). \"lacewing\": \"l\" (12). \"grown\": \"g\" (7). \"mummy\": \"m\" (13). \"infinitude\": \"i\" (9). \"tennessee\": \"t\" (20). \"taillight\": \"t\" (20). \"monarch\": \"m\" (13). \"neural\": \"n\" (14). \"aristocrat\": \"a\" (1). \"topheavy\": \"t\" (20). \"flagler\": \"f\" (6). \"deviate\": \"d\" (4).
Thought 2: We now have: (1) [\"aristocrat\" ? \"aqueduct\"] < (2) \"beebread\" < (3) [\"conveyor\" ? \"courtyard\"] < (4) \"deviate\" < (6) \"flagler\" < (7) \"grown\" < (9) \"infinitude\" < (12) \"lacewing\" < (13) [\"mummy\" ? \"monarch\"] < (14) \"neural\" < (19) [\"semantic\" ? \"shutdown\"] < (20) [\"tennessee\" ? \"taillight\" ? \"topheavy\"].
Thought 3: Now let's sort this subpart [\"aristocrat\" ? \"aqueduct\"] by looking at their second letters. The second letter: \"aristocrat\": \"r\" (18). \"aqueduct\": \"q\" (17).
Thought 4: We now have: (17) \"aqueduct\" < (18) \"aristocrat\" for the subpart. Hence, we have \"aqueduct\" < \"aristocrat\" < \"beebread\" < [\"conveyor\" ? \"courtyard\"] < \"deviate\" < \"flagler\" < \"grown\" < \"infinitude\" < \"lacewing\" < [\"mummy\" ? \"monarch\"] < \"neural\" < [\"semantic\" ? \"shutdown\"] < [\"tennessee\" ? \"taillight\" ? \"topheavy\"].
Thought 5: Now let's sort this subpart [\"conveyor\" ? \"courtyard\"] by looking at their second letters. The second letter: \"conveyor\": \"o\" (15). \"courtyard\": \"o\" (15).
Thought 6: We now have: (15) [\"conveyor\" ? \"courtyard\"] for the subpart. Hence, we have \"aqueduct\" < \"aristocrat\" < \"beebread\" < [\"conveyor\" ? \"courtyard\"] < \"deviate\" < \"flagler\" < \"grown\" < \"infinitude\" < \"lacewing\" < [\"mummy\" ? \"monarch\"] < \"neural\" < [\"semantic\" ? \"shutdown\"] < [\"tennessee\" ? \"taillight\" ? \"topheavy\"].
Thought 7: Now let's sort this subpart [\"mummy\" ? \"monarch\"] by looking at their second letters. The second letter: \"mummy\": \"u\" (21). \"monarch\": \"o\" (15).
Thought 8: We now have: (15) \"monarch\" < (21) \"mummy\" for the subpart. Hence, we have \"aqueduct\" < \"aristocrat\" < \"beebread\" < [\"conveyor\" ? \"courtyard\"] < \"deviate\" < \"flagler\" < \"grown\" < \"infinitude\" < \"lacewing\" < \"monarch\" < \"mummy\" < \"neural\" < [\"semantic\" ? \"shutdown\"] < [\"tennessee\" ? \"taillight\" ? \"topheavy\"].
Thought 9: Now let's sort this subpart [\"semantic\" ? \"shutdown\"] by looking at their second letters. The second letter: \"semantic\": \"e\" (5). \"shutdown\": \"h\" (8).
Thought 10: We now have: (5) \"semantic\" < (8) \"shutdown\" for the subpart. Hence, we have \"aqueduct\" < \"aristocrat\" < \"beebread\" < [\"conveyor\" ? \"courtyard\"] < \"deviate\" < \"flagler\" < \"grown\" < \"infinitude\" < \"lacewing\" < \"monarch\" < \"mummy\" < \"neural\" < \"semantic\" < \"shutdown\" < [\"tennessee\" ? \"taillight\" ? \"topheavy\"].
Thought 11: Now let's sort this subpart [\"tennessee\" ? \"taillight\" ? \"topheavy\"] by looking at their second letters. The second letter: \"tennessee\": \"e\" (5). \"taillight\": \"a\" (1). \"topheavy\": \"o\" (15).
Thought 12: We now have: (1) \"taillight\" < (5) \"tennessee\" < (15) \"topheavy\" for the subpart. Hence, we have \"aqueduct\" < \"aristocrat\" < \"beebread\" < [\"conveyor\" ? \"courtyard\"] < \"deviate\" < \"flagler\" < \"grown\" < \"infinitude\" < \"lacewing\" < \"monarch\" < \"mummy\" < \"neural\" < \"semantic\" < \"shutdown\" < \"taillight\" < \"tennessee\" < \"topheavy\".
Thought 13: I have now sorted all the words. The answer is aqueduct aristocrat beebread conveyor courtyard deviate flagler grown infinitude lacewing mummy monarch neural semantic shutdown taillight tennessee topheavy
Q: Is there a mistake in this sequence? Write \"No\" if there are no mistakes, or the number N if there is a mistake in Thought N.

</example_2>
<example_3>

You are an expert in word sorting. You will be provided with a list of words and the steps taken, in the form of thoughts, to arrange them in alphabetical order.
Your job is to identify the first step that was a mistake in reasoning about the order of the words. This can be misunderstanding the order of
the alphabet or getting the word or letter wrong or missing a word entirely.
Q: Sort the following words alphabetically: List: balletomane brindisi mona secondhand blackstone effluvium beman swabby durward horse lament camille appetite mattock disputant
Thought 1: I should start by looking at the first letter of the words in the list. The first letter: \"balletomane\": \"b\" (2). \"brindisi\": \"b\" (2). \"mona\": \"m\" (13). \"secondhand\": \"s\" (19). \"blackstone\": \"b\" (2). \"effluvium\": \"e\" (5). \"beman\": \"b\" (2). \"swabby\": \"s\" (19). \"durward\": \"d\" (4). \"horse\": \"h\" (8). \"lament\": \"l\" (12). \"camille\": \"c\" (3). \"appetite\": \"a\" (1). \"mattock\": \"m\" (13). \"disputant\": \"d\" (4).
Thought 2: We now have: (1) \"appetite\" < (2) [\"balletomane\" ? \"brindisi\" ? \"blackstone\" ? \"beman\"] < (3) \"camille\" < (4) [\"durward\" ? \"disputant\"] < (5) \"effluvium\" < (8) \"horse\" < (12) \"lament\" < (13) [\"mona\" ? \"mattock\"] < (19) [\"secondhand\" ? \"swabby\"].
Thought 3: Now let's sort this subpart [\"balletomane\" ? \"brindisi\" ? \"blackstone\" ? \"beman\"] by looking at their second letters. The second letter: \"balletomane\": \"a\" (1). \"brindisi\": \"r\" (18). \"blackstone\": \"l\" (12). \"beman\": \"e\" (5).
Thought 4: We now have: (1) \"balletomane\" < (5) \"beman\" < (12) \"blackstone\" < (18) \"brindisi\" for the subpart. Hence, we have \"appetite\" < \"balletomane\" < \"beman\" < \"camille\" < [\"durward\" ? \"disputant\"] < \"effluvium\" < \"horse\" < \"lament\" < [\"mona\" ? \"mattock\"] < [\"secondhand\" ? \"swabby\"].
Thought 5: Now let's sort this subpart [\"durward\" ? \"disputant\"] by looking at their second letters. The second letter: \"durward\": \"u\" (21). \"disputant\": \"i\" (9).
Thought 6: We now have: (9) \"disputant\" < (21) \"durward\" for the subpart. Hence, we have \"appetite\" < \"balletomane\" < \"beman\" < \"camille\" < \"disputant\" < \"effluvium\" < \"horse\" < \"lament\" < [\"mona\" ? \"mattock\"] < [\"secondhand\" ? \"swabby\"] < \"durward\".
Thought 7: Now let's sort this subpart [\"mona\" ? \"mattock\"] by looking at their second letters. The second letter: \"mona\": \"o\" (15). \"mattock\": \"a\" (1).
Thought 8: We now have: (1) \"mattock\" < (15) \"mona\" for the subpart. Hence, we have \"appetite\" < \"balletomane\" < \"beman\" < \"camille\" < \"disputant\" < \"effluvium\" < \"horse\" < \"lament\" < \"mattock\" < [\"secondhand\" ? \"swabby\"] < \"durward\" < \"mona\".
Thought 9: Now let's sort this subpart [\"secondhand\" ? \"swabby\"] by looking at their second letters. The second letter: \"secondhand\": \"e\" (5). \"swabby\": \"w\" (23).
Thought 10: We now have: (5) \"secondhand\" < (23) \"swabby\" for the subpart. Hence, we have \"appetite\" < \"balletomane\" < \"beman\" < \"camille\" < \"disputant\" < \"effluvium\" < \"horse\" < \"lament\" < \"mattock\" < \"secondhand\" < \"swabby\" < \"durward\" < \"mona\".
Thought 11: I have now sorted all the words. The answer is appetite balletomane beman camille disputant effluvium horse lament mattock mona secondhand swabby durward
Q: Is there a mistake in this sequence? Write \"No\" if there are no mistakes, or the number N if there is a mistake in Thought N.

</example_3>
<example_4>

You are an expert in word sorting. You will be provided with a list of words and the steps taken, in the form of thoughts, to arrange them in alphabetical order.
Your job is to identify the first step that was a mistake in reasoning about the order of the words. This can be misunderstanding the order of
the alphabet or getting the word or letter wrong or missing a word entirely.
Q: Sort the following words alphabetically: List: fetal trilingual incredulous boulder ail ar deride right degas snowy verb origami penguin monsanto compute awash dyadic
Thought 1: I should start by looking at the first letter of the words in the list. The first letter: \"fetal\": \"f\" (6). \"trilingual\": \"t\" (20). \"incredulous\": \"i\" (9). \"boulder\": \"b\" (2). \"ail\": \"a\" (1). \"ar\": \"a\" (1). \"deride\": \"d\" (4). \"right\": \"r\" (18). \"degas\": \"d\" (4). \"snowy\": \"s\" (19). \"verb\": \"v\" (22). \"origami\": \"o\" (15). \"penguin\": \"p\" (16). \"monsanto\": \"m\" (13). \"compute\": \"c\" (3). \"awash\": \"a\" (1). \"dyadic\": \"d\" (4).
Thought 2: We now have: (1) [\"ail\" ? \"ar\" ? \"awash\"] < (2) \"boulder\" < (3) \"compute\" < (4) [\"deride\" ? \"degas\" ? \"dyadic\"] < (6) \"fetal\" < (9) \"incredulous\" < (13) \"monsanto\" < (15) \"origami\" < (16) \"penguin\" < (18) \"right\" < (19) \"snowy\" < (20) \"trilingual\" < (22) \"verb\".
Thought 3: Now let's sort this subpart [\"ail\" ? \"ar\" ? \"awash\"] by looking at their second letters. The second letter: \"ail\": \"i\" (9). \"ar\": \"r\" (18). \"awash\": \"w\" (23).
Thought 4: We now have: (9) \"ail\" < (18) \"ar\" < (23) \"awash\" for the subpart. Hence, we have [\"ail\" ? \"ar\" ? \"awash\"] < (2) \"boulder\" < (3) \"compute\" < (4) [\"deride\" ? \"degas\" ? \"dyadic\"] < (6) \"fetal\" < (9) \"incredulous\" < (13) \"monsanto\" < (15) \"origami\" < (16) \"penguin\" < (18) \"right\" < (19) \"snowy\" < (20) \"trilingual\" < (22) \"verb\".
Thought 5: Now let's sort this subpart [\"deride\" ? \"degas\" ? \"dyadic\"] by looking at their second letters. The second letter: \"deride\": \"e\" (5). \"degas\": \"e\" (5). \"dyadic\": \"y\" (25).
Thought 6: We now have: (5) [\"deride\" ? \"degas\"] < (25) \"dyadic\" for the subpart. Hence, we have [\"ail\" ? \"ar\" ? \"awash\"] < (2) \"boulder\" < (3) \"compute\" < (4) [\"deride\" ? \"degas\"] < (5) \"dyadic\" < (6) \"fetal\" < (9) \"incredulous\" < (13) \"monsanto\" < (15) \"origami\" < (16) \"penguin\" < (18) \"right\" < (19) \"snowy\" < (20) \"trilingual\" < (22) \"verb\".
Thought 7: Now let's sort this subpart [\"ail\" ? \"ar\" ? \"awash\"] by looking at their third letters. The third letter: \"ail\": \"l\" (12). \"ar\": \"r\" (18). \"awash\": \"s\" (19).
Thought 8: We now have: (12) \"ail\" < (18) \"ar\" < (19) \"awash\" for the subpart. Hence, we have \"ail\" < \"ar\" < \"awash\" < (2) \"boulder\" < (3) \"compute\" < (4) [\"deride\" ? \"degas\"] < (5) \"dyadic\" < (6) \"fetal\" < (9) \"incredulous\" < (13) \"monsanto\" < (15) \"origami\" < (16) \"penguin\" < (18) \"right\" < (19) \"snowy\" < (20) \"trilingual\" < (22) \"verb\".
Thought 9: Now let's sort this subpart [\"deride\" ? \"degas\"] by looking at their third letters. The third letter: \"deride\": \"r\" (18). \"degas\": \"s\" (19).
Thought 10: We now have: (18) \"deride\" < (19) \"degas\" for the subpart. Hence, we have \"ail\" < \"ar\" < \"awash\" < (2) \"boulder\" < (3) \"compute\" < (4) \"deride\" < (5) \"dyadic\" < (6) \"fetal\" < (9) \"incredulous\" < (13) \"monsanto\" < (15) \"origami\" < (16) \"penguin\" < (18) \"right\" < (19) \"snowy\" < (20) \"trilingual\" < (22) \"verb\".
Thought 11: I have now sorted all the words. The answer is ail ar awash boulder compute deride degas dyadic fetal incredulous monsanto origami penguin right snowy trilingual verb
Q: Is there a mistake in this sequence? Write \"No\" if there are no mistakes, or the number N if there is a mistake in Thought N.

</example_4>
<example_5>

You are an expert in word sorting. You will be provided with a list of words and the steps taken, in the form of thoughts, to arrange them in alphabetical order.
Your job is to identify the first step that was a mistake in reasoning about the order of the words. This can be misunderstanding the order of
the alphabet or getting the word or letter wrong or missing a word entirely.
Q: Sort the following words alphabetically: List: yves butt scribe seaworthy manumitted clapeyron lift blackguard reconnaissance agile woodcut lightfooted sawbelly meager geophysical wiseacre mathieu cognoscenti purposive flamboyant
Thought 1: I should start by looking at the first letter of the words in the list. The first letter: \"yves\": \"y\" (25). \"butt\": \"b\" (2). \"scribe\": \"s\" (19). \"seaworthy\": \"s\" (19). \"manumitted\": \"m\" (13). \"clapeyron\": \"c\" (3). \"lift\": \"l\" (12). \"blackguard\": \"b\" (2). \"reconnaissance\": \"r\" (18). \"agile\": \"a\" (1). \"woodcut\": \"w\" (23). \"lightfooted\": \"l\" (12). \"sawbelly\": \"s\" (19). \"meager\": \"m\" (13). \"geophysical\": \"g\" (7). \"wiseacre\": \"w\" (23). \"mathieu\": \"m\" (13). \"cognoscenti\": \"c\" (3). \"purposive\": \"p\" (16). \"flamboyant\": \"f\" (6).
Thought 2: We now have: (1) \"agile\" < (2) [\"butt\" ? \"blackguard\"] < (3) [\"clapeyron\" ? \"cognoscenti\"] < (6) \"flamboyant\" < (7) \"geophysical\" < (12) [\"lift\" ? \"lightfooted\"] < (13) [\"manumitted\" ? \"mathieu\"] < (16) \"purposive\" < (18) \"reconnaissance\" < (19) [\"scribe\" ? \"seaworthy\" ? \"sawbelly\"] < (23) [\"woodcut\" ? \"wiseacre\"] < (25) \"yves\".
Thought 3: Now let's sort this subpart [\"butt\" ? \"blackguard\"] by looking at their second letters. The second letter: \"butt\": \"u\" (21). \"blackguard\": \"l\" (12).
Thought 4: We now have: (12) \"blackguard\" < (21) \"butt\" for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < [\"clapeyron\" ? \"cognoscenti\"] < \"flamboyant\" < \"geophysical\" < [\"lift\" ? \"lightfooted\"] < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < [\"scribe\" ? \"seaworthy\" ? \"sawbelly\"] < [\"woodcut\" ? \"wiseacre\"] < \"yves\".
Thought 5: Now let's sort this subpart [\"clapeyron\" ? \"cognoscenti\"] by looking at their second letters. The second letter: \"clapeyron\": \"l\" (12). \"cognoscenti\": \"o\" (15).
Thought 6: We now have: (12) \"clapeyron\" < (15) \"cognoscenti\" for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < [\"lift\" ? \"lightfooted\"] < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < [\"scribe\" ? \"seaworthy\" ? \"sawbelly\"] < [\"woodcut\" ? \"wiseacre\"] < \"yves\".
Thought 7: Now let's sort this subpart [\"lift\" ? \"lightfooted\"] by looking at their second letters. The second letter: \"lift\": \"i\" (9). \"lightfooted\": \"i\" (9).
Thought 8: We now have: (9) [\"lift\" ? \"lightfooted\"] for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < [\"lift\" ? \"lightfooted\"] < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < [\"scribe\" ? \"seaworthy\" ? \"sawbelly\"] < [\"woodcut\" ? \"wiseacre\"] < \"yves\".
Thought 9: Now let's sort this subpart [\"manumitted\" ? \"mathieu\"] by looking at their second letters. The second letter: \"manumitted\": \"a\" (1). \"mathieu\": \"a\" (1).
Thought 10: We now have: (1) [\"manumitted\" ? \"mathieu\"] for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < [\"lift\" ? \"lightfooted\"] < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < [\"scribe\" ? \"seaworthy\" ? \"sawbelly\"] < [\"woodcut\" ? \"wiseacre\"] < \"yves\".
Thought 11: Now let's sort this subpart [\"scribe\" ? \"seaworthy\" ? \"sawbelly\"] by looking at their second letters. The second letter: \"scribe\": \"c\" (3). \"seaworthy\": \"e\" (5). \"sawbelly\": \"a\" (1).
Thought 12: We now have: (1) \"sawbelly\" < (3) \"scribe\" < (5) \"seaworthy\" for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < [\"lift\" ? \"lightfooted\"] < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < \"sawbelly\" < \"scribe\" < \"seaworthy\" < [\"woodcut\" ? \"wiseacre\"] < \"yves\".
Thought 13: Now let's sort this subpart [\"woodcut\" ? \"wiseacre\"] by looking at their second letters. The second letter: \"woodcut\": \"o\" (15). \"wiseacre\": \"i\" (9).
Thought 14: We now have: (9) \"wiseacre\" < (15) \"woodcut\" for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < [\"lift\" ? \"lightfooted\"] < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < \"sawbelly\" < \"scribe\" < \"seaworthy\" < \"wiseacre\" < \"woodcut\" < \"yves\".
Thought 15: Now let's sort this subpart [\"lift\" ? \"lightfooted\"] by looking at their third letters. The third letter: \"lift\": \"f\" (6). \"lightfooted\": \"t\" (20).
Thought 16: We now have: (6) \"lift\" < (20) \"lightfooted\" for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < \"lift\" < \"lightfooted\" < [\"manumitted\" ? \"mathieu\"] < \"purposive\" < \"reconnaissance\" < \"sawbelly\" < \"scribe\" < \"seaworthy\" < \"wiseacre\" < \"woodcut\" < \"yves\".
Thought 17: Now let's sort this subpart [\"manumitted\" ? \"mathieu\"] by looking at their third letters. The third letter: \"manumitted\": \"n\" (14). \"mathieu\": \"t\" (20).
Thought 18: We now have: (14) \"manumitted\" < (20) \"mathieu\" for the subpart. Hence, we have \"agile\" < \"blackguard\" < \"butt\" < \"clapeyron\" < \"cognoscenti\" < \"flamboyant\" < \"geophysical\" < \"lift\" < \"lightfooted\" < \"manumitted\" < \"mathieu\" < \"purposive\" < \"reconnaissance\" < \"sawbelly\" < \"scribe\" < \"seaworthy\" < \"wiseacre\" < \"woodcut\" < \"yves\".
Thought 19: I have now sorted all the words. The answer is agile blackguard butt clapeyron cognoscenti flamboyant geophysical lift lightfooted manumitted mathieu meager purposive reconnaissance sawbelly scribe seaworthy woodcut wiseacre yves
Q: Is there a mistake in this sequence? Write \"No\" if there are no mistakes, or the number N if there is a mistake in Thought N.

</example_5>


请完成上述谜题的训练场环境类实现，包括所有必要的方法。
"""

from internbootcamp.bootcamp.base import Basebootcamp
import re
import random
import json
import string
# from nltk.corpus import words as whole_words


class BbehWordSortingbootcamp(Basebootcamp):
    SUPPORTED_SUBTASKS = ["error_detection", "pure_sort"]
    SORT_ORDER = ["normal_order", "custom_sorting"]
    MODES = ["normal", "reverse"]

    def __init__(self, **params):
        super().__init__(**params)
        self.max_word_list_size = params.get("max_word_list_size", 16)
        self.min_word_list_size = params.get("min_word_list_size", 8)
        self.subtask_ratio = params.get("subtask_ratio", 0.5)  # 子任务1占比
        self.mode_ratio = params.get("mode_ratio", 0.5)
        self.order_ratio = params.get("order_ratio", 0.5)
        # self.alphabet = self._generate_alphabet(params.get('letter_swaps', self.LETTER_SWAPS))
        internbootcamp_path = "/".join(__file__.split('/')[:-3])
        with open(f'{internbootcamp_path}/libs/data/words_alpha_370000.txt', 'r') as f:
            words = f.readlines()
            words = [w.strip() for w in words]
        self.word_pool = words
        

    def case_generator(self):
        """生成包含两种子任务的案例"""
        subtask = random.choices(
            self.SUPPORTED_SUBTASKS,
            weights=[self.subtask_ratio, 1 - self.subtask_ratio],
            k=1,
        )[0]
        enable_error = True if subtask == "error_detection" else False
        mode = random.choices(
            self.MODES, weights=[self.mode_ratio, 1 - self.mode_ratio], k=1
        )[0]
        sort_order = random.choice(self.SORT_ORDER)

        custom_order = None
        if sort_order == "custom_sorting":
            letters = list(string.ascii_lowercase)
            random.shuffle(letters)
            custom_order = letters

        word_list_size = random.randint(
            self.min_word_list_size, self.max_word_list_size
        )
        words = random.sample(self.word_pool, word_list_size)
        words = [w.lower() for w in words]

        meta_info = self._verbal_sort_words(
            word_list=words,
            mode=mode,
            custom_order=custom_order,
            enable_error=enable_error,
        )
        if subtask == "error_detection":
            while len(meta_info["thoughts"]) < word_list_size:
                words = random.sample(self.word_pool, word_list_size)
                words = [w.lower() for w in words]
                meta_info = self._verbal_sort_words(
                    word_list=words,
                    mode=mode,
                    custom_order=custom_order,
                    enable_error=enable_error,
                )

        meta_info["words"] = words
        meta_info["subtask"] = subtask
        meta_info["mode"] = mode
        meta_info["sort_order"] = sort_order
        meta_info["custom_order"] = custom_order
        return meta_info

    @staticmethod
    def prompt_func(case) -> str:
        
        prompt = ""
        special_requirements = ""

        if case["custom_order"] is not None:
            special_requirements += f"""
You need to sort the words according to a custom alphabet. The custom order is: {', '.join(case['custom_order'])}
"""

        if case["mode"] == "reverse":
            special_requirements += f"""
Add a new sorting rule. When sorting the words, you need to sort the letters in reverse order, comparing from right to left instead of left to right.
"""
        if special_requirements != "":
            special_requirements = "There are some special requirements:\n" + special_requirements
        if case["subtask"] == "error_detection":
            prompt = """
You are an expert in word sorting. You will be provided with a list of words and the steps taken, in the form of thoughts.
Your job is to identify the first step that was a mistake in reasoning about the order of the words. This can be misunderstanding the order of
the alphabet or getting the word or letter wrong or missing a word entirely.
"""
            prompt += f"""Q: Sort the following words: List: {" ".join(case['words'])}
{special_requirements}
{chr(10).join(f"Thought {i+1}: {t}" for i, t in enumerate(case['thoughts']))}

Q: Is there a mistake in this sequence? Write \"No\" if there are no mistakes, or the number N if there is a mistake in Thought N.
Please reason step by step and put your final answer within [answer] [/answer] tags.
"""
        else:
            prompt = f"""Sort the following words with the new alphabet and separate them with comma: {", ".join(case['words'])}
{special_requirements}
Format your final answer as: [answer]<comma-separated words>[/answer].
Please reason step by step and put your final answer within [answer] [/answer] tags."""

        return prompt

    @staticmethod
    def extract_output(output):
        """提取两种子任务的答案"""
        answer_match = re.search(r"\[answer\](.*?)\[/answer\]", output, re.DOTALL)
        if not answer_match:
            return None

        answer = answer_match.group(1).strip()

        # 尝试解析为错误步骤编号
        if answer.lower() == "no":
            return "No"
        try:
            return int(answer)
        except ValueError:
            # 可能是子任务2的列表答案
            try:
                return json.loads(answer)
            except:
                return [w.strip() for w in answer.split(",")]

    @classmethod
    def _verify_correction(cls, solution, case):
        """根据子任务类型验证答案"""
        if case["subtask"] == "error_detection":
            expected = case["first_error_step"]
            return str(solution) == str(expected)
        else:
            # 验证排序结果与自定义字母表一致
            try:
                expected = case["sorted_words"]
                return solution == expected
            except:
                return False

    def _verbal_sort_words(
        self, word_list, mode="normal", custom_order=None, enable_error=True
    ):
        if custom_order is not None:
            key_map = {char: i for i, char in enumerate(custom_order)}
            key_map[""] = -1
        assert mode in ["normal", "reverse"], "Sort mode should in [normal, reverse]!"

        def get_key(word, depth):
            if mode == "normal":  # 正序
                return word[depth] if len(word) > depth else ""
            elif mode == "reverse":  # 反序
                return word[-(depth + 1)] if len(word) > depth else ""

        def verbal_sort_step(words, depth=0):
            if len(words) <= 1:
                return words

            # 1. 提取当前字符进行分组
            grouped = {}
            reverve_group = {}
            for word in words:
                # generate error
                # missing the letter
                if enable_error and random.random() < 0.02:
                    if verbal_sort_step.first_error_step is None:
                        verbal_sort_step.first_error_step = verbal_sort_step.thought_count
                    continue
                key = get_key(word, depth)
                if enable_error and random.random() < 0.02:
                    error_key = random.choice(word)
                    if error_key != key:
                        key = error_key
                        if verbal_sort_step.first_error_step is None:
                            verbal_sort_step.first_error_step = verbal_sort_step.thought_count
                grouped.setdefault(key, []).append(word)
                reverve_group[word] = key
            # 2. 输出当前分组情况
            try:
                sorted_keys = sorted(
                    grouped.keys(),
                    key=lambda k: key_map[k] if custom_order and k in key_map else k,
                )
            except Exception as e:
                # 
                pass
                
            
            verbal_letters = ""
            verbal_letters += f"The {number_to_ordinal(depth + 1)} letter: "
            for word in reverve_group.keys():
                verbal_letters += f"{word}: {reverve_group[word]}. "
            correct_sorted_keyes = sorted_keys.copy()

            special_prompt = "" if mode == "normal" else " from right to left"
            if verbal_sort_step.thought_count == 1:
                thought = f"I should start by looking at the first letter of the words in the list. "
            else:
                thought = f"Now let's sort this subpart {words}. Looking at the {number_to_ordinal(depth + 1)} letters{special_prompt}."
            thought += verbal_letters
            verbal_sort_step.thought_list.append(thought)
            verbal_sort_step.thought_count += 1

            # 3. 更新全局状态并输出
            new_global_state = []
            global_state = verbal_sort_step.global_state.copy()
            for group in global_state:
                if set(group) == set(words):
                    for key in sorted_keys:
                        new_global_state.append(grouped[key])
                else:
                    new_global_state.append(group)

            # 随机生成错误
            if enable_error and random.random() < 0.2:  # 20% 的概率生成错误
                # 错误类型1: 错误的比较逻辑
                rnd = random.random()
                if rnd < 0.3:
                    sorted_keys = sorted(
                        grouped.keys(),
                        reverse=True,
                        key=lambda k: (
                            key_map[k] if custom_order and k in key_map else k
                        ),
                    )  # 错误地反向排序
                elif rnd < 0.7:
                    random.shuffle(sorted_keys)
                else:
                    sorted_keys = list(grouped.keys())  # 错误地不排序
                if (
                    verbal_sort_step.first_error_step is None
                    and sorted_keys != correct_sorted_keyes
                ):
                    verbal_sort_step.first_error_step = verbal_sort_step.thought_count

            # 随机生成错误
            if enable_error and random.random() < 0.2:  # 20% 的概率生成错误
                # 错误类型2: 错误的全局状态更新
                # 随机打乱global state
                global_state = new_global_state
                random.shuffle(global_state)
                if verbal_sort_step.first_error_step is None:
                    verbal_sort_step.first_error_step = verbal_sort_step.thought_count
            else:
                global_state = new_global_state
            verbal_sort_step.global_state = global_state
            
            thought = f"We now have: "
            thought += " < ".join(
                [
                    f"{grouped[k] if len(grouped[k]) > 1 else grouped[k][0]}"
                    for k in sorted_keys
                ]
            )
            if verbal_sort_step.thought_count != 2:
                thought += "for the subpart. Hence, we have "
                thought += " < ".join(
                    [f"{group if len(group) > 1 else group[0]}" for group in global_state]
                )
            
            verbal_sort_step.thought_list.append(thought)
            verbal_sort_step.thought_count += 1

            # 4. 递归对子序列排序
            sorted_words = []
            for key in sorted_keys:
                new_words = verbal_sort_step(grouped[key], depth + 1)
                sorted_words.extend(new_words)

            return sorted_words

        verbal_sort_step.thought_count = 1
        verbal_sort_step.thought_list = []
        verbal_sort_step.first_error_step = None
        verbal_sort_step.global_state = [word_list.copy()]
        sorted_words = verbal_sort_step(word_list)
        answer_thought = "I have now sorted all the words. The answer is " + " ".join(sorted_words)
        verbal_sort_step.thought_list.append(answer_thought)
        return {
            "thoughts": verbal_sort_step.thought_list,
            "first_error_step": (
                "No"
                if verbal_sort_step.first_error_step is None
                else verbal_sort_step.first_error_step
            ),
            "sorted_words": sorted_words,
        }


def number_to_ordinal(n):
    ordinals = {
        1: "first",
        2: "second",
        3: "third",
        4: "fourth",
        5: "fifth",
        6: "sixth",
        7: "seventh",
        8: "eighth",
        9: "ninth",
        10: "tenth"
    }
    
    return ordinals.get(n, f"{n}th")

if __name__ == "__main__":
    bootcamp = BbehWordSortingbootcamp()
    case = bootcamp.case_generator()
    print(case)
    prompt = bootcamp.prompt_func(case)
    print(f"prompt: {prompt}")

    if case["subtask"] == "error_detection":
        answer = case["first_error_step"]
    else:
        answer = ", ".join(case["sorted_words"])
    answer = f"[answer]{answer}[/answer]"
    print(f"answer: {answer}")

    extracted_output = bootcamp.extract_output(answer)
    print(f"extracted_output: {extracted_output}")

    verify_result = bootcamp.verify_score(answer, case)
    print(f"verify_result: {verify_result}")
